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Abstract — We present two graph-based algorithms for multiclass segmentation of high-dimensional data. The algorithms use a diffuse 
interface model based on the Ginzburg-Landau functional, related to total variation compressed sensing and image processing. A 
multiclass extension is introduced using the Gibbs simplex, with the functional 's double- well potential modified to handle the 
multiclass case. The first algorithm minimizes the functional using a convex splitting numerical scheme. The second algorithm is a 
uses a graph adaptation of the classical numerical Merriman-Bence-Osher (MBO) scheme, which alternates between diffusion and 
thresholding. We demonstrate the performance of both algorithms experimentally on synthetic data, grayscale and color images, 
and several benchmark data sets such as MNIST, COIL and WebKB. We also make use of fast numerical solvers for finding the 
eigenvectors and eigenvalues of the graph Laplacian, and take advantage of the sparsity of the matrix. Experiments indicate that 
the results are competitive with or better than the current state-of-the-art multiclass segmentation algorithms. 

Index Terms — segmentation, Ginzburg-Landau functional, diffuse interface, MBO scheme, graphs, convex splitting, image processing, 
high-dimensional data. 



I. Introduction 

Multiclass segmentation is a fundamental problem in com- 
puter vision and machine learning. We present a multiclass 
generalization of the graph-based segmentation procedure in- 
troduced in [5]. The model applies L2 gradient flow mini- 
mization of the Ginzburg-Landau (GL) diffuse interface energy 
functional to the case of functions defined on graphs. The algo- 
rithm in [5] performs binary segmentations in a semi-superised 
learning (SSL) framework, but multiclass tasks are solved by 
recursively applying a sequence of binary segmentations. Our 
new method rapidly and simultaneously segments multiclass 
benchmark data with higher accuracy than most previously 
existing methods, while avoiding the recurrent applications of 
the minimization procedure. 

This new formulation uses a phase-field representation of 
the GL energy functional: a vector- valued quantity is assigned 
to every data point such that each of its components represents 
the fraction of the phase, or class, present in that particular 
point. The components of the field variable add up to one, 
so the phase-field vector is constrained to lie on the Gibbs 
simplex. The phase-field representation is used in the material 
sciences to study the evolution of multi-phase systems [26] 
and has been employed before for multiclass image segmen- 
tation [36]. Nevertheless, to the best of our knowledge, this 
is the first time that a vector field GL representation in the 
context of functions defined on graphs is applied for multiclass 
semi- supervised classification of high dimensional data. 

In addition, we apply this Gibbs simplex idea to the 
graph-based Merriman-Bence-Osher (MBO) scheme devel- 
oped in [38]. The MBO scheme [39] is a well-established PDE 
method for evolving an interface by mean curvature. As with 
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the diffuse interface model, tools for nonlocal calculus [27] 
are used in [38] to generalize the PDE formulation to the 
graph setting. By introducing the phase-field representation to 
the graph-based MBO scheme, we develop another new and 
highly efficient algorithm for multiclass segmentation in a SSL 
framework. 

Thus, the main contributions of our work are: (i) to in- 
troduce two new graph-based methods for multiclass data 
segmentation, namely a multiclass GL formulation based on 
the binary representation described in [5] and a multiclass 
graph-based MBO that extends the model in [38]; and (ii) 
to present very efficient algorithms to segment multiclass 
geometric, image, and text data. 

The paper is organized as follows. In section II we discuss 
prior related work. In section III, we discuss the fundamentals 
of diffuse interface models in machine learning. We then 
describe our two algorithms in sections IV and V. In section 
VI, we present experimental results on benchmark data sets, 
demonstrating the effectiveness of our methods. Finally, in 
section VII, we conclude and discuss ideas for future work. 

II. Previous Work 

The discrete graph formulation of GL energy minimization 
may be understood as an example of the more general form 
of energy (or cost) functional for data classification used in 
the context of machine learning, 

EW) = \\iP\\ a + ti\\iP-j\\l (1) 

where the norm \\ip\\ a of the classification function ip is a 
regularization term, and ||^ — is a fidelity term, incor- 
porating most (supervised) or just a few (semi- supervised) of 
the known values The choice of the regularization norm 
|| • || a has non-trivial consequences in the final classification 
accuracy. In instances where an L 2 norm is used, the resulting 
cost functional is a tradeoff between accuracy in the classifi- 
cation of given labels vs. function smoothness. Although this 
encodes the reasonable assumption that the labels should vary 
smoothly almost everywhere, it is also important to preserve 



2 



the sharp discontinuities that may arise in the boundaries 
between classes. Hence the interest in formulations that can 
produce piecewise smooth solutions [8]. 

Graph-based regularization terms, expressed in terms of the 
discrete Laplace operator on graphs, are often used in simi- 
supervised learning as a way to exploit underlying similarities 
in the data set [3], [13], [52], [54]-[56]. Additionally, some 
of these methods use a matrix representation to generalize 
eq. (1) to the multiple class case [13], [52], [54], [56]. The 
rows in the matrix correspond to graph vertices and the 
columns to indicator functions for class membership: the class 
membership for vertex i is computed as the column with 
largest component in the zth row. The resulting minimization 
procedure is akin to multiple relaxed binary classifications 
running in parallel. This representation is different from the 
Gibbs simplex we use, as there is usually no requirement that 
the elements in the row add up to 1. 

An alternative regularization method for the graph-based 
multiclass setup is presented in [47], where the authors mini- 
mize a Kullback-Leibler divergence function between discrete 
probability measures that translates into class membership 
probabilities. 

Not all the methods deal directly with the multiple classes in 
the data set. A different approach is to reduce the multiclass 
case to a series of two-class problems and to combine the 
sequence of resulting sub-classifications. Strategies employed 
include recursive partitioning, hierarchical classification and 
binary encodings, among others. For example, Szlam and 
Bresson present a method involving Cheeger cuts and split 
Bregman iteration [28] to build a recursive partitioning scheme 
in which the data set is repeatedly divided until the desired 
number of classes is reached. In [29], a pairwise coupling 
is described, in which each two-class problem is solved and 
then a class decision is made combining the decisions of all 
the subproblems. In [19], a binary encoding approach of the 
class labels is used. 

Our methods, on the other hand, have roots in the contin- 
uous setting as they are derived via a variational formulation. 
Our first method comes from a variational formulation of 
the L 2 gradient flow minimization of the GL functional [5]. 
Our second method is built upon the MBO classical scheme 
to evolve interfaces by mean curvature [39]. The latter has 
connections with the work presented in [22], where an MBO- 
like scheme is used for image segmentation. The method 
is motivated by the propagation of the Allen-Cahn equation 
with a forcing term, obtained by applying gradient descent to 
minimize the GL functional with a fidelity term. 

Alternative variational principles have also been used for 
image segmentation. In [36], a multiclass labeling for image 
analysis is carried out by a multidimensional total variation 
formulation involving a simplex-constrained convex optimiza- 
tion. In that work, a discretization of the resulting PDEs is 
used to numerically solve for the minimization of the energy. 
In contrast, the discretization in both of our algorithms is given 
by posing the problem in a graph setting. We represent the 
data as nodes in a weighted graph, with each edge assigned 
a measure of similarity between the vertices it is connecting. 
Nonlocal calculus, such as that outlined in [27], is the tool used 



to generalize the continuous formulation to a discrete setting 
given by functions on graphs. This approach has successfully 
been used in other areas such as spectral graph theory [15], 
[40]. 

As pointed out in [5], there are interesting connections be- 
tween the GL functional on graphs and normalized graph cuts. 
Shi and Malik [45] pose the problem of image segmentation 
as the solution of a generalized eigensystem generated from 
a graph Laplacian. In [8], graph cuts are used to efficiently 
find local minima of a wide class of energies with vari- 
ous smoothness constraints for multiclass image restoration. 
Also, as mentioned earlier, the method in [48] is a recursive 
graph-based partition scheme. A multiclass algorithm for 
the transductive learning problem in high-dimensional data 
classification, described in [10], is based on i 1 relaxation 
of the Cheeger cut and the piecewise constant Mumford- 
Shah or Potts models. In [9], rigorous convergence results are 
presented for two algorithms that solve the relaxed Cheeger 
cut minimization used for unsupervised data clustering are 
presented. Our proposed methods are related to some of these 
approaches, but use the graph Ginzburg-Landau functional 
framework. 

In the continuous setting, it can be shown that the GL 
is a diffuse interface approximation to the total variational 
functional [21], [33], and analogous results have recently been 
proved in the graph setting as well [6]. This function is a 
natural framework for producing smooth labels everywhere 
while preserving sharp discontinuities, with the sharpness 
controlled by a diffuse interface parameter. The advantage of 
the diffuse interface model is that the energy functional is 
more tractable, and can be minimized by simpler numerical 
methods. 

III. Segmentation using the Ginzburg-Landau 
Functional on Graphs 

A. Ginzburg Landau Functional and Diffuse Interface Model 

The classical Ginzburg-Landau (GL) functional, originally 
proposed to describe physical phenomena such as liquid-gas 
transitions and superconductivity [46], can be written as 

GL{u) = ^J \Vu\ 2 dx + i J $(u)dx, (2) 

where u is a scalar field defined over a space of arbitrary 
dimensionality and representing the state of the phases in the 
system, V denotes the spatial gradient operator, <&(u) is a 
double- well potential, such as $(u) = \{u 2 — l) 2 and e is a 
positive constant. In the next subsection we derive in detail 
the proper formulation in a graph setting (see eq. (12)). 

The functional (2) is composed of two terms: a smoothing 
term that measures the differences in the components of 
the field, and a potential term that measures how far each 
component is from a specific value (±1 in the example 
above). Consequently, the minimization of the first term leads 
to smoother regions, while the minimization of the second 
penalizes variations from the minima of the double-well 
potential. Given initial conditions with states +1 and states 
— 1 distributed randomly in the domain, the minimization of 
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the GL functional entails an inherent conflict between the two 
terms in the functional, leading to the generation of a transition 
region: the diffuse interface. The smoothness of this diffuse 
interface is regulated by the parameter e. For small e, the state 
minimizing the functional contains sharp transitions between 
the minima of the double-well potential, while a large e gives 
more weight to the smoothing term so that the transitions are 
more gradual. 

It is shown in [33] that the e — >> limit of the GL functional, 
in the sense of T -convergence, is the Total Variation (TV) 
semi-norm, so one can write: 

GL(u)^ T \\u\\tv (3) 

The advantage of the GL functional is that its L 2 gradient flow 
leads to a linear differential operator, which allows us to use 
spectral methods for minimization. 

The diffuse interface description, with its controlled smooth- 
ness transition between phases, is attractive for segmentation 
problems due to its straightforward way of modeling the 
separation of a domain into regions or phases. It has been 
used successfully in image impainting [7], [20] and image 
segmentation [22]. The standard practice is to introduce an 
additional term in the energy functional to escape from trivial 
steady-state solutions (e.g., all labels taking on the same 
value). This leads to the expression 

E(u) = GL(u) + F(u,u), (4) 

where F is the additional term, usually called fidelity. This 
fidelity term allows the specification of any known information 
about the particular task at hand, for example, regions of an 
image that belong to a certain class. 

B. Application to Semi-Supervised Learning (SSL) on Graphs 

Inspired in part by the PDE-based imaging community, 
where variational algorithms combining ideas from spectral 
methods on graphs with nonlinear edge detection methods are 
common [27], Bertozzi and Flenner extended in [5] the L 2 
gradient flow of the Ginzburg-Landau (GL) energy functional 
to the domain of functions on a graph. 

The energy E(u) in (4) can be minimized in the L 2 sense 
using gradient descent. This leads to the following dynamic 
equation (modified Allen- Cahn equation): 

du SGL SF 1 . SF ^ 

where A represents the Laplacian operator. A local minimizer 
of the energy is obtained by evolving this expression to steady 
state. Note that E is not convex, and may have multiple local 
minima. 

Before continuing further, let us introduce some simple 
graph concepts we will be using in subsequent sections. 

1 ) Graph Framework for Large Data Sets: Let G be an 
undirected graph G = (V, E), where V and E are the 
sets of vertices and edges, respectively. The vertices are the 
building blocks of the data set, such as points in R n or 
pixels in an image. The similarity between vertices i and j 
is measured by a weight function w(i,j) that satisfies the 
symmetric property w(i,j)= w(j,i). A large value of w(i,j) 



indicates that vertices i and j are similar to each other, while 
a small w(i,j) indicates that they are dissimilar. For example, 
an often used similarity measure is the Gaussian function 
w(i,j) = exp(— d(i, j) 2 /a 2 ), with d(i,j) representing the 
Euclidean distance between the points associated with vertices 
i and j, and a 2 a positive parameter. 

Define W as the matrix Wij = w(i,j), and define the 
degree of a vertex i G V as 

di = ^2w(i,j). (6) 

jev 

If D is the diagonal matrix with elements di, then the graph 
Laplacian is defined as the matrix L = D — W. 

2) Ginzburg-Landau Functional on Graphs: Instead of the 
continuous domain of the original functional, the diffuse 
interface model of Bertozzi and Flenner expresses the L 2 flow 
for the discrete graph structure. Thus the graph serves as the 
domain in which the energy minimization takes place. 

Nonlocal calculus, such as that outlined in [27], provides a 
way to generalize the continuous GL formulation to the case 
of weighted graphs. It also shows that the Laplace operator 
is related to the graph Laplacian matrix defined above, and 
that the eigenvectors of the discrete Laplacian converge to 
the eigenvectors of the Laplacian [5]. However, to guarantee 
convergence to the continuum differential operator in the limit 
of large sample size, the matrix L must be correctly scaled [5]. 
Although there are several ways of doing this, the two methods 
used more often are the random walk Laplacian 

= D" 1 L = I-D- 1 W, (7) 

related to Markov processes [51], and the symmetric normal- 
ized Laplacian 

L s = D ^LD ~i = I - D~^WD~i (8) 

Since the symmetric property of L s allows for more efficient 
implementations, we will use this operator in all our compu- 
tations. 

Note that L s satisfies: 




for all u G W 1 . Here the subscript i refers to the i th coordinate 
of the vector, and the brackets denote the standard dot product. 
Note also that L s has nonnegative, real-valued eigenvalues, 
including 0. 

Likewise, it is important to point out that for tasks such 
as data classification, the use of a graph representation has 
the advantages of providing a way to deal with nonlinearly 
separable classes as well as simplifying the processing of high 
dimensional data. 

The GL functional on graphs is then expressed as 

GiW^^M + i^^-i) 2 , (ii) 

iev 
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where U{ is the (real- valued) state of node i. The first term 
replaces the gradient term in (2), and the second term is the 
double- well potential function with minima at ±1, appropriate 
for binary classifications. 

3) Semi-Supervised Learning (SSL) on Graphs: In the 
graph version of the GL functional, and in general in graph- 
based learning methods, the graph is constructed such that the 
edges represent the similarities in the data set and the nodes 
have an associated real state that encodes, with an appropriate 
thresholding operation, class membership. 

In addition, in some data sets, the label of a small fraction 
of data points is known beforehand. The possibility of using 
this a priori information considerably improves the learning 
accuracy, explaining in part the popularity of semi- supervised 
learning (SSL) methods. The graph generalization of the 
diffuse interface model handles this condition effortlessly by 
using the labels of known points. The GL functional for SSL 
is expressed as: 

— (?#.. I i..?* ) 4- — 



e(u) = -{uM + ^y^uf-iy 



iev 



4e ^ 

iev 

— [Ui-Ui] . 



(12) 



The final term in the sum is the new fidelity term that enforces 
those label values that are known beforehand. Hi is a parameter 
that takes the value of a positive constant /i if i is a fidelity 
node and zero otherwise, and ui is the known value of fidelity 
node i. 

Therefore, given an initial state Ui of each vertex i, the 
problem consists of minimizing the GL functional with fidelity 
term. The classes are then obtained by thresholding the values 
of Ui to the closer of either 1 or —1. The result is a 
very efficient method for binary data segmentation. In the 
following sections, the modifications introduced to generalize 
this functional to multiclass segmentation are described. 

IV. Multiclass Ginzburg-Landau Approach 
A. Extension to Multiclass Segmentation 

Given Np data points, we generalize the label vector u 
to a label matrix U = (ui, . . . , \\n d ) t . Rather than node i 
adopting a single state ui G M, it now adopts a composition 
of states expressed by a vector G R K where the kth 
component of is the strength with which it takes on class 
k. The matrix U has dimensions Np x K, where K is the 
total number of possible classes. 

For each node i, we require the vector to be an element 
of the Gibbs simplex E K , defined as 



(x u . . .,x K ) G [0, 1] 



K 



K 

E 

k=i 



1 



(13) 



Vertex k of the simplex is given by the unit vector e&, 
whose kth component equals 1 and all other components 
vanish. These vertices correspond to pure phases, where the 
node belongs exclusively to class k. Note that the simplex 
formulation has a straightforward probabilistic interpretation, 
with Ui representing the probability distribution over the K 
classes. 



The multiclass GL energy functional for the phase field 
approach on graphs is written as: 



E(XJ) 



where 



|(U,L s U) + l^fnjK-e fc || 2 Ll ) 

iev \k=i J 



iev 



H\\ , 



(U,L S U) 



trace(U T L s U), 



and \ii is a vector indicating prior class knowledge of sample 
i. We set iii = e& if node i is known to be in class k. 

As mentioned before, the first (smoothing) term in the GL 
functional (14) measures variations in the vector field. The 
simplex representation has the advantage that, like in Potts- 
based models but unlike in some other multiclass methods, 
the penalty assigned to differently labeled neighbors is in- 
dependent of the integer ordering of the labels. The second 
(potential) term drives the system closer to the vertices of the 
simplex, with the use of an Li norm preventing the emergence 
of an undesirable minimum at the center of the simplex, as 
would happen with an L2 norm for large K. This potential 
aims to provide a clear way to calculate class memberships, as 
the phase composition is purer near the vertices of the simplex. 
The compromise between the smoothing and potential terms 
is established through the constant e. The third (fidelity) term 
enables the encoding of a priori information. 

B. Energy Minimization 

Following [5], we use a convex splitting scheme to minimize 
the GL functional in the phase field approach. The energy 
functional (14) is decomposed into convex and concave parts 
as: 



E(U) — ^convex (U) + Concave (U) 

^convex(U) = | (U, L S U) + ^ (U, U) 

I K I 

^concave(U) = — ^ \\ - \\u { - e k f Li 

iev k=i 

+ £f N-<Ml a La -f 

iev 



(U,U) 



with C G R denoting a constant that is chosen to guarantee 
the convexity/concavity of the energy terms. Under the right 
conditions, this approach results in an unconditionally stable 
time-discretization scheme for gradient descent [22], [24], [53] 
of the form 

tfl+i+rft "™ (tfl+i) = ug-dt %^(C/« fe ). (15) 

OUik OUik 

Evaluating the functional derivatives, we write this equation 
in matrix form as 

U n+1 + dt (eL s U n+1 + CU n+1 ) 

= U n - dt f ^T n + /z(U n - U) - CU n J , (16) 
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where 



K K 



m=l 



(17) 



/x is a diagonal matrix with elements /i^, and U 

(ui,...,Ua/-J T . 

Solving (16) for U n+1 gives the iteration equation 



U n+1 = B 1 
where 



(1 + C dt) U n - ^ T n - dtv(U n - U) 



B = (l + Cdi)I + edfL a 



(18) 
(19) 



This implicit scheme allows the evolution of U to be numer- 
ically stable regardless of the time step dt, in spite of the 
numerical "stiffness" of the underlying differential equations 
which could otherwise force dt to be impractically small. 
(Note, though, that stability is a separate issue from accuracy.) 

In general, after the update, the phase field is no longer on 
the Y> K simplex. Consequently, we use the procedure in [14] 
to project back to the simplex. 

Computationally, the schemes' numerical efficiency is in- 
creased by using a low-dimensional subspace spanned by 
only a small number of eigenf unctions. Let X be the matrix 
of eigenvectors of L s and A be the diagonal matrix of 
corresponding eigenvalues. We now write L s as its eigende- 
composition L s = XAX T , 



X[(l + C<te)I + edtA] X 1 



(20) 



but we approximate X by a truncated matrix retaining only 
N e eigenvectors (N e <C Np), to form a matrix of dimension 
Np x N e . The term in brackets is simply a diagonal N e x N e 
matrix. This allows B to be calculated rapidly, but more im- 
portantly it allows the update step (18) to be decomposed into 
two significantly faster matrix multiplications (as discussed 
below), while sacrificing little accuracy in practice. 

For initialization, the phase compositions of the fidelity 
points are set to the vertices of the simplex corresponding 
to the known labels, while the phase compositions of the rest 
of the points are set randomly. 

The energy minimization proceeds until a steady state 
condition is reached. Once the change of the norm of the 
vector field in subsequent iterations falls below a threshold, 
the system is no longer evolving and the energy decrement is 
negligible. Consequently, the calculation is stopped when 



max, ||u; n+1 -u^" 2 
max ?: ||u^+ 1 || 2 



< rj, 



(21) 



where r\ represents a given small positive constant. The final 
classes are obtained by assigning class k to node i if is 
closest to vertex on the Gibbs simplex. 

The specific algorithm is outlined in Figure 1. Note that 
other operator splitting methods have been studied for mini- 
mization problems (e.g. [36]). Ours however has the following 
advantages: (i) it is direct (i.e. it does not require the solution 
of further minimization problems), (ii) the resolution can 
be adjusted by increasing the number of eigenvectors N e 



used in the representation of the phase field, and (iii) it has 
low complexity. To see this final point, observe that each 
iteration of the multiclass GL algorithm has only 0(NdKN c ) 
operations for the main loop, since matrix Z in Figure 1 only 
has dimensions N e x K, and then 0(NdK log K) for the 
projection to the simplex. Usually, N e <C N D and K <C N D , 
so the dominant factor is simply the size of the data set . In 
addition, it is generally the case that the number of iterations 
required for convergence is moderate (around 200 iterations). 
Thus, practically speaking, the complexity of the algorithm is 
linear. 

V. MBO Reduction of the Ginzburg-Landau 
Energy Functional 

Given that the modified Allen-Cahn equation (5), resulting 
from the L 2 gradient flow of the GL energy functional, pro- 
duces an efficient method for data segmentation when adapted 
to function estimation on graphs, a natural question is what 
other adaptations of this formulation can be equally effective 
when expressed in the graph domain. The immediate answer 
comes from the relation between the Allen-Cahn equation and 
the motion by mean curvature. 

Let us start by reviewing this connection in the contin- 
uous setting. In [39], Merriman, Bence and Osher propose 
an algorithm to approximate motion by mean curvature, or 
motion in which normal velocity equals mean curvature, using 
threshold dynamics. The authors note that if one applies the 
heat equation to an interface, then the diffusion blunts the 
sharp points of the boundary, but has very little effect on 
the flatter regions. Therefore, one can imagine that diffusion 
creates some sort of motion by mean curvature, providing that 
we specify the boundaries of the moving set. 

Given a phase field u(z, t), consider the basic (unmodified) 
Allen-Cahn equation, namely equation (5) without the fidelity 
term: 

^ = eAu--<f>'(u). (22) 
dt e v ' 

For small values of e, the following time- splitting scheme can 

be used numerically to evolve the Allen-Cahn equation: 

1) The first step is propagation using: 

du 
dt 

2) The second step is propagation using: 



eAu 



(23) 



(24) 



Note, however, that in the e — >> limit, the second step is 
simply thresholding [39]. Thus, as e — >> 0, the time splitting 
scheme above consists of alternating between diffusion and 
thresholding steps. 

It has been shown [44] that in the limit e — >> 0, the rescaled 
solutions u e (z,t/e) of (22) yield motion by mean curvature 
of the interface between the two phases of the solutions. This 
motivates the two sequential steps of the MBO scheme: 

1) Diffusion. Let u n+ ^ = S(St)u n where S(St) is the 
propagator (by time St) of the standard heat equation: 
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Fig. 1: Multiclass GL algorithm 



Require: e, dt, N D , 7V e , K, //, U, A, X 
Ensure: out = U end 

Y<- [(l + Ccfe)I + ed*A] _1 X T 
for i = 1 -> N D do 

<- rand((0, 1)), ^ <- projectTo Simplex^). If ^ > 0, U£ <- 17. ° 
end for 
n <- 1 

while Stop criterion not satisfied do 
for / = 1 7V D , fc = 1 -> K do 



T i/T <- E^i \ (1 - 24/3) ||iK n - ep\\ Li n 
end for 



K 



Z ^ Y 



(1 + C dt) u n - f T n - ^/x(U n - U) 

u n+1 <- xz 

for i = 1 N D do 

u^ n+1 ^ pro jectTo Simplex (\ii n+l ) 
end for 
n <— n + 1 
end while 



2) Thresholding. Let 



1 if i/^+i > 0, 
-1 ifu n +^ < 0. 



Barles [2] and Evans [23] have proven rigorously that this 
scheme approximates motion by mean curvature. 

A. Graph Formulation 

The motion by mean curvature of the MBO scheme can be 
generalized to the case of functions on a graph in much the 
same way as the procedure followed for the modified Allen- 
Cahn equation (5) in [5]. Merkurjev et al. have pursued this 
idea in [38], where a modified MBO scheme on graphs has 
been applied to the case of binary segmentation. The authors 
apply a two-step time splitting scheme to (5) so that the second 
step is the same as the one in the original MBO scheme, and 
then replace the term with a more general graph term 
— L s u. The discretized version of the algorithm is: 

1) Heat equation with forcing term: 



u 



n+i 



dt 



-L s u n - /x(n n -u). (26) 



2) Thresholding: 



if >0, 

if u 7 !^ < o. 



Here, after the second step, u f can take only two values of 1 or 
— 1; thus, this method is appropriate for binary segmentation. 
Note that the fidelity term scaling can be different from the 
one in (5). 

In practice, it can be productive to repeat the diffusion 
step a number of times before thresholding. In order to keep 



the convention that one iteration of the diffusion-thresholding 
procedure corresponds to one time step, we divide dt by the 
number of diffusion steps per iteration, which we denote TV^. 

B. Extension to Multiclass Segmentation 

Using the standard Gibbs- simplex E K defined in Section 
IV, the multiclass extension of the algorithm in [38] is straight- 
forward. The notation is the same as in the previous section: 
we use a matrix U to represent the phase composition of 
nodes. The second step of the algorithm is modified, however, 
so that the thresholding is converted to the displacement of the 
vector field variable towards the closest vertex in the Gibbs 
simplex. In other words, now in the second step the row vector 
u^ n+ 2 of step 1 is projected back to the simplex (using the 
approach outlined in [14] as before) and then a pure phase 
given by the vertex in the Yi K simplex closest to u^ n+ 2 is 
assigned to be the new phase composition of node i. 

In summary, the new algorithm consists of alternating be- 
tween the following two steps to obtain approximate solutions 
U n at discrete times: 

1) Heat equation with forcing term: 

\J n +\ - U n 

= -L s U n - /z(U n - U). (27) 



dt 

2) Thresholding: 



u 



n+l 



(28) 



where vertex is the vertex in the simplex closest to 

projectToSimplex(\ii n +2). 
As for the multiclass GL algorithm, when a label is known, it 
is represented by the corresponding vertex in the Y> K simplex. 
The final classification is achieved by assigning node i to class 
k if if the kth component of is one. Again, as in the binary 
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Fig. 2: Multiclass MBO Algorithm 



Require: dt, Np, 7V e , Ns, U, A, X 
Ensure: out = U end l 

i+^a)" 

1 — » N]j do 



Y <- 
for i 



<- rand((0, 1)), u^ <- projectToSimplex(ui°). If ^ > 0, <- 
end for 
n <- 1 

while Stop criterion not satisfied do 
for 5 = 1 -> Afe do 

Z ^ Y [u n - ^ Ax(U n - U) 

U n+1 <- XZ 
end for 

for i = 1 Nd do 

Ui n+1 ^— pro jectTo Simplex (u^ n+1 ) 
u^ n+1 efe, where k is closest simplex vertex to u^ n+1 
end for 
n <— n + 1 
end while 



case, the diffusion step can be repeated a number of times 
before thresholding and when that happens, dt is divided by 
the number of diffusion iterations N s . 

As in the previous section, we use an implicit numerical 
scheme. For the MBO algorithm, the procedure involves 
modifying (27) to apply L s to U n+ 2 instead of to U n . This 
gives the diffusion step 



where 



B 1 U n -^/x(U n -U) 



B = I + dt~L 8 . 



As before, we use the eigendecomposition L s 
write 

B = X(I + cfeA) X T , 



(29) 

(30) 
XAX T to 

(31) 



which we approximate using the first N e eigenfunctions. The 
initialization procedure and the stopping criterion are the same 
as in the previous section. 

The multiclass MBO algorithm is summarized in Figure 2. 
Its complexity is 0(NdKN c Ns) operations for the main 
loop, 0(N D K log K) operations for the projection to the 
simplex and O(NdK) operations for thresholding. As for 
the multiclass GL algorithm, N e <C N D and K <C Njj- 
Furthermore Ns needs to be set to three, and due to the 
thresholding step, we find that extremely few iterations (e.g., 
6) are needed to reach steady state. Thus, in practice, the 
complexity of this algorithm is linear as well, and typical 
runtimes are very rapid as shown in Table III. 

VI. Experimental Results 

We have tested our algorithms on synthetic data, grayscale 
and color images, and the MNIST, COIL and WebKB bench- 
mark data sets. In all cases, we compute the symmetric 



normalized graph Laplacian matrix L s , of expression (8), 
using A/'-neighborhood graphs: in other words, vertices i and 
j are connected only if i is among the N nearest neighbors of 
j or if j is among the N nearest neighbors of i. Otherwise, 
we set w(i,j) = 0. This results in a sparse matrix, making 
calculations and algorithms more tractable. In addition, for the 
similarity function we use the local scaling weight function of 
Zelnik-Manor and Perona [43], defined as: 



w(ij) exp 



(32) 



where d(i,j) is some distance measure between vertices i and 
j, such as the L 2 distance, and y/r(i) = d(i, k) defines a local 
value for each vertex i, parametrized by M, with k being the 
index of the Mth closest vertex to i. 

With the exception of the images, all the results and 
comparisons with other published methods are summarized 
in Tables I and II. Due to the arbitrary selection of the 
fidelity points, our reported values correspond to averages 
obtained over 10 runs with different random selections. The 
timing results and number of iterations of the two methods 
are shown in Tables III and IV, respectively. The methods 
are labeled as "multiclass GL" and "multiclass MBO". These 
comparisons show that our methods exhibit a performance that 
is competitive with or better than the current state-of-the-art 
segmentation algorithms. 

Parameters are chosen to be compatible between the meth- 
ods. For the multiclass GL method, the convexity constant 
used is: C = fi + \. This is the minimum constant that 
guarantees the convexity and concavity of the terms in the 
energy partition of the convex splitting strategy employed. 
For the multiclass MBO method, as discussed in the previous 
section, the diffusion step can be repeated a number of times 
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before thresholding. In all of our results, we run the diffusion 
step three times before any thresholding is done (Ns = 3). 

To compute the eigenvectors and eigenvalues of the sym- 
metric graph Laplacian, we use fast numerical solvers. As we 
only need to calculate a portion of the eigenvectors to get 
good results, we use the Rayleigh-Chebyshev procedure of 
[1] for computing all the eigendecompositions. This numerical 
solver is especially efficient for producing a few of the smallest 
eigenvectors of a sparse symmetric matrix. For example, for 
the MNIST data set of 70,000 images, it was only necessary 
to calculate 300 eigenvectors, which is less than 0.5% of the 
data set size. This is one of the factors that makes our methods 
very efficient. 

A. Synthetic Data 

The synthetic data set we tested our method against is the 
three moons data set. It is constructed by generating three half 
circles in R 2 . The two half top circles are unit circles with 
centers at (0,0) and (3,0). The bottom half circle has radius 
1.5 and the center at (1.5, 0.4). Five hundred points from each 
of those three half circles are sampled and embedded in M 100 
by adding Gaussian noise with standard deviation of 0.14 to 
each of the 100 components of each embedded point. The 
dimensionality of the data set, together with the noise, makes 
segmentation a significant challenge. 

The weight matrix of the graph edges was calculated using 
N = 10 nearest neighbors and local scaling based on the 17 th 
closest point (M = 17). The fidelity term was constructed by 
labeling 25 points per class, 75 points in total, corresponding 
to only 5% of the points in the data set. 

The multiclass GL method used the following parameters: 
20 eigenvectors, e = 1, dt = 0.1, fi = 30, r\ — 10 -7 . The 
method was able to produce an average of 98.4% of correct 
classification, with a corresponding computation time of 0.16 
s per run on a 2.4 GHz Intel Core i2 Quad without any parallel 
processing. 

Analogously, the multiclass MBO method used the follow- 
ing parameters: 20 eigenvectors, dt = 0.1, /i = 30, r] = 10 -7 . 
It was able to segment an average of 99.12% of the points 
correctly over 10 runs with only 3 iterations and about 0.01 s 
of computation time. One of the results obtained is shown in 
Figure 3. 

Table I gives published results from other related methods, 
for comparison. Note that the results for p-Laplacians [11], 
Cheeger cuts [48] and binary GL are for the simpler binary 
problem of two moons (also embedded in M 100 ). While, 
strictly speaking, these are unsupervised methods, they all 
incorporate prior knowledge such as a mass balance constraint. 
We therefore consider them comparable to our SSL approach. 
The "tree GL" method [25] uses a scalar multiclass GL 
approach with a tree metric. It can be seen that our methods 
achieve the highest accuracy on this test problem. 

B. Image Segmentation 

a) Grayscale (single channel) Image: We tested our 
algorithms on the image of figures shown in Figure 4a. This 
is a 191 x 196 pixel grayscale image, to be divided into 




Fig. 3: Segmentation of three moons using multiclass MBO 
(98.4667% correct). 

five classes: black, dark gray, medium gray, light gray and 
white. To construct the weight matrix, we use feature vectors 
corresponding to a pixel's ^-coordinate, ^-coordinate, and 
intensity. For the fidelity term, 1,500 or 4% of the points were 
randomly chosen. 

A local scaling graph with N = 30 and M = 30 is 
constructed and used by both our algorithms. 

The multiclass Ginzburg-Landau method was applied using 
the following parameters: 10 eigenvectors, e = 1, dt = 0.5, 
fi = 50 and r] = 10 -7 . It was able to segment the classes 
perfectly, with an average time of 4.1 s. The original image 
as well as the segmentation are included in Figure 4. In each 
segmentation, the white regions correspond to the identified 
class. 

The multiclass MBO method used the following parameters: 
10 eigenvectors, dt = 0.5, /i = 50, r] = 10 -7 . The algorithm 
was able to segment all the points correctly in 2 iterations and 
0.232 s. 

We compare our result to the method of Li and Kim 
[37], which also segments the image perfectly. However, their 
method requires additional information, such as the densities 
of each class, that we do not need in our method. 

b) Color (multichannel) Image: We then tested our al- 
gorithms on the color image of cows shown in Figure 5 a. This 
is a 213 x 320 color image, to be divided into four classes: 
sky, grass, black cow and red cow. 

To construct the weight matrix, we use feature vectors 
defined as the set of intensity values in the neighborhood of a 
pixel. The neighborhood is a patch of size 5x5. Red, green 
and blue channels are appended, resulting in a feature vector 
of dimension 75. A local scaling graph with N = 30 and 
M = 30 is constructed for both algorithms. For the fidelity 
term, 2.6% of labeled pixels are used. 

The multiclass Ginzburg-Landau method used the following 
parameters: 30 eigenvectors, e = 1, dt = 0.1, /x = 50 and 
r] — 10 -7 . The average time for segmentation using different 
fidelity sets was 19.9 s. 

The multiclass MBO method used the following parameters: 
30 eigenvectors, dt = 0.1, \i = 50, r] = 10 -7 . The average 
time for segmentation over 10 runs was around 1.2 s, and there 
were 11 iterations. 

One of the results of each of our two methods (using the 
same fidelity set) is depicted in Figure 5. It can be seen that 
both methods are able to identify all the classes, with almost 
no perceptible difference between the two results. Most of the 



TABLE I: Results for benchmark data sets: Moons, MNIST, COIL and WebKB 

MNIST 



Two/Three moons 



Method 


Accuracy 


spectral clustering [25] 


80% 


p-Laplacian [11] 


94% 


Cheeger cuts [48] 


95.4% 


tree GL [25] 


97.4% 


binary GL [5] 


97.7% 


multiclass GL 


98.4% 


multiclass MBO 


99.12% 



Method 


Accuracy 


p-Laplacian [11] 


87.1% 


multicut normalized 1-cut [30] 


87.64% 


linear classifiers [34], [35] 


88% 


Cheeger cuts [48] 


88.2% 


boosted stumps [32], [35] 


92.3-98.74% 


transductive classification [49] 


92.6% 


tree GL [25] 


93.0% 


fc-nearest neighbors [34], [35] 


95.0-97.17% 


neural/convolutional nets [16], [34], [35] 


95.3-99.65% 


nonlinear classifiers [34], [35] 


96.4-96.7% 


multiclass GL 


96.8% 


multiclass MBO 


96.91% 


SVM [18], [34] 


98.6-99.32% 



COIL 



WebKB 



Method 


Accuracy 


k -nearest neighbors [47] 


83.5% 


LapRLS [3], [47] 


87.8% 


sGT [31], [47] 


89.9% 


SQ-Loss-I [47] 


90.9% 


MP [47] 


91.1% 


multiclass GL 


91.4% 


multiclass MBO 


91.46% 



Method 


Accuracy 


vector method [12] 


64.47% 


fc-nearest neighbors (k = 10) [12] 


72.56% 


centroid (normalized sum) [12] 


82.66% 


naive Bayes [12] 


83.52% 


SVM (linear kernel) [12] 


85.82% 


multiclass GL 


87.3% 


multiclass MBO 


88.48% 



TABLE II: WebKB results with varying fidelity percentage 



Method 


10% 


15% 


20% 


25% 


30% 


WebKB results for Multiclass GL (% correct) 


81.5% 


84.2% 


85.4% 


86.7% 


87.3% 


WebKB results for Multiclass MBO (% correct) 


83.71% 


85.75% 


86.81% 


87.74% 


88.48% 



TABLE III: Comparison of timings (in seconds) 



Data set 


three moons 


grayscale image 


color image 


MNIST 


COIL 


WebKB 


Graph Calculation 


0.771 


19.96 


645.34 


6183.1 


0.95 


399.35 


Eigenvector Calculation 


0.331 


210.10 


190.93 


1683.5 


0.19 


64.78 


Multiclass GL 


0.163 


4.08 


19.92 


811.5 


2.31 


6.97 


Multiclass MBO 


0.013 


0.23 


1.20 


15.4 


0.03 


0.05 



TABLE IV: Comparison of number of iterations 



Data set 


three moons 


grayscale image 


color image 


MNIST 


COIL 


WebKB 


Multiclass GL 


140 


90 


200 


460 


700 


275 


Multiclass MBO 


3 


2 


11 


7 


6 


7 



TABLE V: Confusion Matrix for the MNIST Data Segmentation - Multiclass GL 



Obtained/True 





1 


2 


3 


4 


5 


6 


7 


8 


9 





6844 


1 


43 


4 


3 


16 


21 


2 


19 


19 


1 


6 


7809 


36 


8 


35 


2 


14 


62 


58 


15 


2 


5 


22 


6733 


45 


2 


4 


1 


27 


19 


7 


3 





3 


20 


6882 


1 


91 





1 


89 


92 


4 


1 


16 


6 


2 


6626 


4 


7 


15 


28 


75 


5 


9 





3 


75 





6072 


28 


2 


125 


14 


6 


31 


5 


11 


3 


23 


65 


6802 





31 


4 


7 


2 


16 


108 


45 


11 


7 





7078 


19 


110 


8 


1 


2 


22 


42 


4 


15 


3 


2 


6365 


20 


9 


4 


3 


8 


35 


119 


37 





104 


72 


6602 
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TABLE VI: Confusion Matrix for the MNIST Data Segmentation - MBO Scheme 



Obtained/True 





1 


2 


3 


4 


5 


6 


7 


8 


9 





6844 


20 


41 


3 


3 


15 


21 


1 


20 


17 


1 


5 


7789 


32 


8 


34 


1 


14 


63 


51 


14 


2 


5 


22 


6731 


42 


2 


4 


1 


23 


19 


8 


3 





3 


20 


6890 


1 


86 





1 


81 


90 


4 


1 


17 


6 


2 


6625 


3 


7 


12 


28 


67 


5 


9 





3 


70 





6077 


28 


2 


109 


14 


6 


31 


5 


11 


3 


22 


69 


6800 





29 


5 


7 


2 


16 


117 


44 


12 


9 





7093 


20 


101 


8 


2 


2 


21 


46 


4 


17 


5 


2 


6398 


22 


9 


4 


3 


8 


33 


121 


32 





96 


70 


6620 



mistakes made correspond to identifying some borders of the 
red cow as part of the black cow. 





(a) Original Image 



(b) Class 1 





(c) Class 2 



(d) Class 3 




> 



(e) Class 4 (f) Class 5 

Fig. 4: Grayscale image segmentation 



C. MNIST Data 

The MNIST data set [35] is composed of 70, 000 28 x 28 
images of handwritten digits through 9. Examples of entries 
can be found in Figure 6. The task is to classify each of the 
images into the corresponding digit. The images include digits 
from to 9; thus, this is a 10 class segmentation problem. 

To construct the weight matrix, we used N = 8 nearest 
neighbors with local scaling based on the 8 th closest neighbor 
(M = 8). Note that we perform no preprocessing, i.e. the 
graph is constructed using the 28 x 28 images. For the fidelity 
term, 250 images per class (2500 images corresponding to 
3.6% of the data) are chosen randomly. 

The multiclass GL method used the following parameters: 
300 eigenvectors, e = 1, dt = 0.15, /i = 50 and r] = 10 -7 . 
The complete set of 70,000 images was segmented with an 




(a) Original Image 




(b) Black Cow: multiclass GL (c) Black Cow: multiclass MBO 




(d) Red Cow: multiclass GL (e) Red Cow: multiclass MBO 




(f) Grass: multiclass GL (g) Grass: multiclass MBO 




(h) Sky: multiclass GL (i) Sky: multiclass MBO 

Fig. 5: Color image segmentation 
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average accuracy of 96.8% of the digits classified correctly in 
an average time of 811 s. The averages are obtained over 10 
runs. The confusion matrix for the best result obtained is in- 
cluded in Table V. Most of the mistakes were in distinguishing 
digits 4 and 9, and digits 5 and 8. 

The multiclass MBO method used the following parameters: 
300 eigenvectors, dt = 0.15, fi = 50, r] = 10 -7 . The algorithm 
was able to segment an average of 96.91% of the digits 
correctly over 10 runs in only 4 iterations and 15.382 s. We 
display the confusion matrix in Table VI. Note that most of 
the mistakes were in distinguishing digits 4 and 9, and digits 
2 and 7. 

Table I compares our results with those from other methods 
in the literature. As with the three moon problem, some 
of these are based on unsupervised methods but incorporate 
enough prior information that they can fairly be compared with 
SSL methods. The methods of linear/nonlinear classifers, k- 
nearest neighbors, boosted stumps, neural and convolutional 
nets and SVM are all supervised learning approaches, taking 
60,000 of the digits as a training set and 10,000 digits as a 
testing set [35], in comparison to our SSL approaches where 
we take only 3.6% of the points for the fidelity term. Our 
algorithms are nevertheless competitive with, and in most 
cases outperform, these supervised methods. Moreover, we 
perform no preprocessing or initial feature extraction on the 
image data, unlike most of the other methods we compare with 
(we did exclude from the comparison, however, methods that 
explicitly deskewed the image). While there is a computational 
price to be paid in forming the graph when data points use all 
784 pixels as features (see graph calculation time in Table III), 
this is a one-time operation that conceptually simplifies our 
approach. 
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Fig. 6: Examples of digits from the MNIST data base 

D. COIL dataset 

We evaluated our performance on the benchmark COIL 
data set [13], [41]. This is a set of color 128 x 128 images 
of 100 objects, taken at different angles. The red channel 
of each image was then downsampled to 16 x 16 pixels by 
averaging over blocks of 8 x 8 pixels. Then 24 of the objects 
were randomly selected and then partitioned into six classes. 
Discarding 38 images from each class leaves 250 per class, 
giving a data set of 1500 data points. 

To construct the weight matrix, we used N = 4 nearest 
neighbors with local scaling based on the A th closest neighbor 



(M = 4). The fidelity term was constructed by labeling 10% 
of the points, selected at random. 

For multiclass GL, the parameters were: 50 eigenvectors, 
e = 1, dt = 0.2, ii = 100 and rj = 10" 7 . This resulted in 
91.4% of the points classified correctly (average) in 2.3 s. 

For multiclass MBO, the parameters were: 50 eigenvectors, 
dt = 0.2, fi = 100, r] — 10 -7 . We obtained an accuracy 
of 91.46%, averaged over 10 runs. The procedure took 6 
iterations and 0.03 s. 

Comparative results reported in [47] are shown in Table I. 
These are all SSL methods (with the exception of fc-nearest 
neighbors which is supervised), using 10% fidelity just as we 
do. Our results are of comparable or greater accuracy. 

E. WebKB dataset 

Finally, we tested our methods on the task of text clas- 
sification on the WebKB data set [17]. This is a collection 
of webpages from Cornell, Texas, Washington and Wisconsin 
universities, as well as other miscellaneous pages from other 
universities. The webpages are to be divided into four classes: 
project, course, faculty and student. The data set is prepro- 
cessed as described in [12]. 

To construct the weight matrix, we used 575 nearest neigh- 
bors. Tfidf term weighting [12] is used to represent the website 
feature vectors. They were then normalized to unitary length. 
The weight matrix points are calculated using cosine similarity. 

For the multiclass GL method, the parameters were: 250 
eigenvectors, e = 1, dt = 1, /i = 50 and r] = 10 -7 . The 
average accuracies obtained are: 81.5%, 84.2%, 85.4%, 86.7% 
and 87.3% over fidelity sets of 10%, 15%, 20%, 25% and 30% 
of the points, respectively. The average computation time is 
6.97 s. 

For the multiclass MBO method, the parameters were: 250 
eigenvectors, dt = 1, /i = 4, 77 = 10 -7 . We obtained average 
accuracies of: 83.71%, 85.75%, 86.81%, 87.74% and 88.48% 
over fidelity sets of 10%, 15%, 20%, 25% and 30% of the 
points, respectively. The procedure took an average of 0.05 s 
and 7 iterations. 

We compare our results with those of several supervised 
learning methods reported in [12], shown in Table I. For 
these methods, two thirds of the data was used for training, 
and one third for testing. Our SSL methods obtain higher 
accuracy, using only 20% fidelity (for multiclass MBO). Note 
also that a larger sample of points for the fidelity term reduces 
the error in the classification results, as shown in Table II. 
Nevertheless, the accuracy is high even for the smallest fidelity 
sets. Therefore, the methods appear quite adequate for the 
SSL setting where only a few labeled data points are known 
beforehand. 

Multiclass GL and MBO: All the results reported point 
out that both multiclass GL and multiclass MBO perform well 
in terms of data segmentation accuracy. While the ability to 
tune multiclass GL can be an advantage, multiclass MBO is 
simpler and, in our examples, displays even better performance 
in terms of its greater accuracy and tiny number of iterations 
required. The relative strength and speed of multiclass MBO 
may not always hold, but the avoidance of a nonconvex 
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functional minimization that takes place in multiclass GL may 
explain the accuracy and speed increase. Exploring the under- 
lying connections of the energy evolution of these methods and 
the energy landscape for the relaxed Cheeger cut minimization 
recently established in [9] are to be explored in future work. 

VII. Conclusions 

We have presented two graph-based algorithms for mul- 
ticlass classification of high dimensional data. The two al- 
gorithms are based on the diffuse interface model using the 
Ginzburg-Landau functional, and the multiclass extension is 
obtained using the Gibbs simplex. The first algorithm min- 
imizes the functional using gradient descent and a convex- 
splitting scheme. The second algorithm executes a simple 
scheme based on an adaptation of the classical numerical MBO 
method. It uses fewer parameters than the first algorithm, and 
while this may in some cases make it more restrictive, in our 
experiments it was highly accurate and efficient. Both of these 
algorithms demonstrate how methods motivated by the PDE 
literature can be productively adapted to graphs, producing 
effective multiclass data segmentation methods. 

Testing the algorithms on synthetic data, images and bench- 
mark data sets shows that the results are competitive with 
or better than some of the most recent and best published 
algorithms in the literature. In addition, our methods have 
several advantages. First, they are simple and efficient, avoid- 
ing the need for intricate function minimizations or heavy 
preprocessing of data. Second, a relatively small proportion of 
fidelity points is needed for producing an accurate result. For 
most of our data sets, we used at most 10% of the data points 
for the fidelity term; for synthetic data and the two images, 
we used no more than 5%. Furthermore, the minimization 
obtained is not sensitive to the initial state. As long as the 
fidelity set contains samples of all classes in the problem, a 
random initialization is enough to produce good multiclass 
segmentation results. Finally, our methods do not use one-vs- 
all or sequences of binary segmentations that are needed for 
some other multiclass methods. We therefore avoid the bias 
and extra processing that is often inherent in those methods. 

Our algorithms benefit from the sparsity of the neigh- 
borhood graphs generated by the local scaling procedure of 
Perona and Zelnik-Manor [43], as well as from the fast 
numerical Rayleigh-Chebyshev method of Anderson [1] used 
to find a subset of eigenvalues and eigenvectors of the resulting 
graph Laplacian matrices. 

In all of the data sets we have studied, multiclass MBO 
performed better than multiclass GL in terms of accuracy and 
convergence time. Nevertheless, we anticipate that more intri- 
cate geometries could impair its effectiveness. In those cases, 
multiclass GL might still perform well, due to the additional 
control provided by tuning e to increase the thickness of the 
interfaces, producing smoother decision functions. 
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